15 research outputs found

    Optimizing Weights And Biases in MLP Using Whale Optimization Algorithm

    Get PDF
    Artificial Neural Networks are intelligent and non-parametric mathematical models inspired by the human nervous system. They have been widely studied and applied for classification, pattern recognition and forecasting problems. The main challenge of training an Artificial Neural network is its learning process, the nonlinear nature and the unknown best set of main controlling parameters (weights and biases). When the Artificial Neural Networks are trained using the conventional training algorithm, they get caught in the local optima stagnation and slow convergence speed; this makes the stochastic optimization algorithm a definitive alternative to alleviate the drawbacks. This thesis proposes an algorithm based on the recently proposed Whale Optimization Algorithm(WOA). The algorithm has proven to solve a wide range of optimization problems and outperform existing algorithms. The successful implementation of this algorithm motivated our attempts to benchmark its performance in training feed-forward neural networks. We have taken a set of 20 datasets with different difficulty levels and tested the proposed WOA-MLP based trainer. Further, the results are verified by comparing WOA-MLP with the back propagation algorithms and six evolutionary techniques. The results have proved that the proposed trainer can outperform the current algorithms on the majority of datasets in terms of local optima avoidance and convergence speed

    A Brief Survey of Deep Learning Approaches for Learning Analytics on MOOCs

    Get PDF
    Massive Open Online Course (MOOC) systems have become prevalent in recent years and draw more attention, a.o., due to the coronavirus pandemic’s impact. However, there is a well-known higher chance of dropout from MOOCs than from conventional off-line courses. Researchers have implemented extensive methods to explore the reasons behind learner attrition or lack of interest to apply timely interventions. The recent success of neural networks has revolutionised extensive Learning Analytics (LA) tasks. More recently, the associated deep learning techniques are increasingly deployed to address the dropout prediction problem. This survey gives a timely and succinct overview of deep learning techniques for MOOCs’ learning analytics. We mainly analyse the trends of feature processing and the model design in dropout prediction, respectively. Moreover, the recent incremental improvements over existing deep learning techniques and the commonly used public data sets have been presented. Finally, the paper proposes three future research directions in the field: knowledge graphs with learning analytics, comprehensive social network analysis, composite behavioural analysis

    Language as a latent sequence: Deep latent variable models for semi-supervised paraphrase generation

    Get PDF
    This paper explores deep latent variable models for semi-supervised paraphrase generation, where the missing target pair for unlabelled data is modelled as a latent paraphrase sequence. We present a novel unsupervised model named variational sequence auto-encoding reconstruction (VSAR), which performs latent sequence inference given an observed text. To leverage information from text pairs, we additionally introduce a novel supervised model we call dual directional learning (DDL), which is designed to integrate with our proposed VSAR model. Combining VSAR with DDL (DDL+VSAR) enables us to conduct semi-supervised learning. Still, the combined model suffers from a cold-start problem. To further combat this issue, we propose an improved weight initialisation solution, leading to a novel two-stage training scheme we call knowledge-reinforced-learning (KRL). Our empirical evaluations suggest that the combined model yields competitive performance against the state-of-the-art supervised baselines on complete data. Furthermore, in scenarios where only a fraction of the labelled pairs are available, our combined model consistently outperforms the strong supervised model baseline (DDL) by a significant margin ( ; Wilcoxon test). Our code is publicly available at https://github.com/jialin-yu/latent-sequence-paraphrase

    INTERACTION: A Generative XAI Framework for Natural Language Inference Explanations

    Full text link
    XAI with natural language processing aims to produce human-readable explanations as evidence for AI decision-making, which addresses explainability and transparency. However, from an HCI perspective, the current approaches only focus on delivering a single explanation, which fails to account for the diversity of human thoughts and experiences in language. This paper thus addresses this gap, by proposing a generative XAI framework, INTERACTION (explaIn aNd predicT thEn queRy with contextuAl CondiTional varIational autO-eNcoder). Our novel framework presents explanation in two steps: (step one) Explanation and Label Prediction; and (step two) Diverse Evidence Generation. We conduct intensive experiments with the Transformer architecture on a benchmark dataset, e-SNLI. Our method achieves competitive or better performance against state-of-the-art baseline models on explanation generation (up to 4.7% gain in BLEU) and prediction (up to 4.4% gain in accuracy) in step one; it can also generate multiple diverse explanations in step two

    A Generative Bayesian Graph Attention Network for Semi-supervised Classification on Scarce Data

    Get PDF
    This research focuses on semi-supervised classification tasks, specifically for graph-structured data under datascarce situations. It is known that the performance of conventional supervised graph convolutional models is mediocre at classification tasks, when only a small fraction of the labeled nodes are given. Additionally, most existing graph neural network models often ignore the noise in graph generation and consider all the relations between objects as genuine ground-truth. Hence, the missing edges may not be considered, while other spurious edges are included. Addressing those challenges, we propose a Bayesian Graph Attention model which utilizes a generative model to randomly generate the observed graph. The method infers the joint posterior distribution of node labels and graph structure, by combining the Mixed-Membership Stochastic Block Model with the Graph Attention Model. We adopt a variety of approximation methods to estimate the Bayesian posterior distribution of the missing labels. The proposed method is comprehensively evaluated on three graph-based deep learning benchmark data sets. The experimental results demonstrate a competitive performance of our proposed model BGAT against the current state of the art models when there are few labels available (the highest improvement is 5%), for semi-supervised node classification tasks

    MONEY: Ensemble learning for stock price movement prediction via a convolutional network with adversarial hypergraph model

    Get PDF
    Stock price prediction is challenging in financial investment, with the AI boom leading to increased interest from researchers. Despite these recent advances, many studies are limited to capturing the time series characteristics of price movement via recurrent neural networks (RNNs) but neglect other critical relevant factors, such as industry, shareholders, and news. On the other hand, graph neural networks have been applied to a broad range of tasks due to their superior performance in capturing complex relations among entities and representation learning. This paper investigates the effectiveness of using graph neural networks for stock price movement prediction. Inspired by a recent study, we capture the complex group-level information (co-movement of similar companies) via hypergraphs. Unlike other hypergraph studies, we also use a graph model to learn pairwise relations. Moreover, we are the first to demonstrate that this simple graph model should be applied before using RNNs, rather than later, as prior research suggested. In this paper, the long-term dependencies of similar companies can be learnt by the next RNNs, which augments their predictability. We also apply adversarial training to capture the stochastic nature of the financial market and enhance the generalisation of the proposed model. Hence, we contribute with a novel ensemble learning framework to predict stock price movement, named MONEY. It is comprised of (a) a Graph Convolution Network (GCN), representing pairwise industry and price information and (b) a hypergraph convolution network for group-oriented information transmission via hyperedges with adversarial training by adding perturbations on inputs before the last prediction layer. Real-world data experiments demonstrate that MONEY significantly outperforms, on average, the state-of-the-art methods and performs particularly well in the bear market

    Contrastive Learning with Heterogeneous Graph Attention Networks on Short Text Classification

    Get PDF
    Graph neural networks (GNNs) have attracted extensive interest in text classification tasks due to their expected superior performance in representation learning. However, most existing studies adopted the same semi-supervised learning setting as the vanilla Graph Convolution Network (GCN), which requires a large amount of labelled data during training and thus is less robust when dealing with large-scale graph data with fewer labels. Additionally, graph structure information is normally captured by direct information aggregation via network schema and is highly dependent on correct adjacency information. Therefore, any missing adjacency knowledge may hinder the performance. Addressing these problems, this paper thus proposes a novel method to learn a graph structure, NC-HGAT, by expanding a state-of-the-art self-supervised heterogeneous graph neural network model (HGAT) with simple neighbour contrastive learning. The new NC-HGAT considers the graph structure information from heterogeneous graphs with multilayer perceptrons (MLPs) and delivers consistent results, despite the corrupted neighbouring connections. Extensive experiments have been implemented on four benchmark short-text datasets. The results demonstrate that our proposed model NC-HGAT significantly outperforms state-of-the-art methods on three datasets and achieves competitive performance on the remaining dataset

    Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention

    Get PDF
    Medical visual question answering (Med-VQA) is to answer medical questions based on clinical images provided. This field is still in its infancy due to the complexity of the trio formed of questions, multimodal features and expert knowledge. In this paper, we tackle, a ’myth’ in the Natural Language Processing area - that unimodal bias is always considered undesirable in learning models. Additionally, we study the effect of integrating a novel dynamic attention mechanism into such models, inspired by a recent graph deep learning study.Unlike traditional attention, dynamic attention scores are conditioned on different query words in a question and thus enhance the representation learning ability of texts. We propose that some questions are answered more accurately with a reinforcement of question embedding after fusing multimodal features. Extensive experiments have been implemented on the VQA-RAD datasets and demonstrate that our proposed model, reinforCe unimOdal dynamiC Attention (COCA), outperforms the state-of-the-art methods overall and performs competitively at open-ended question answering
    corecore